Goto

Collaborating Authors

 causal analysis



Causal analysis of Covid-19 Spread in Germany

Neural Information Processing Systems

In this work, we study the causal relations among German regions in terms of the spread of Covid-19 since the beginning of the pandemic, taking into account the restriction policies that were applied by the different federal states. We loose a strictly formulated assumption for a causal feature selection method for time series data, robust to latent confounders, which we subsequently apply on Covid-19 case numbers.


A Causal Analysis of Harm

Neural Information Processing Systems

As autonomous systems rapidly become ubiquitous, there is a growing need for a legal and regulatory framework toaddress when and how such a system harms someone. There have been several attempts within the philosophy literature to define harm, but none of them has proven capable of dealing with with the many examples that have been presented, leading some to suggest that the notion of harm should be abandoned and ``replaced by more well-behaved notions''. As harm is generally something that is caused, most of these definitions have involved causality at some level. Yet surprisingly, none of them makes use of causal models and the definitions of actual causality that they can express. In this paper we formally define a qualitative notion of harm that uses causal models and is based on a well-known definition of actual causality (Halpern, 2016). The key novelty of our definition is that it is based on contrastive causation and uses a default utility to which the utility of actual outcomes is compared. We show that our definition is able to handle the examples from the literature, and illustrate its importance for reasoning about situations involving autonomous systems.


From 'What-is' to 'What-if' in Human-Factor Analysis: A Post-Occupancy Evaluation Case

Chen, Xia, Sun, Ruiji, Geyer, Philipp, Borrmann, André, Schiavon, Stefano

arXiv.org Artificial Intelligence

Human-factor analysis typically employs correlation analysis and significance testing to identify relationships between variables. However, these descriptive ('what-is') methods, while effective for identifying associations, are often insufficient for answering causal ('what-if') questions. Their application in such contexts often overlooks confounding and colliding variables, potentially leading to bias and suboptimal or incorrect decisions. We advocate for explicitly distinguishing descriptive from interventional questions in human-factor analysis, and applying causal inference frameworks specifically to these problems to prevent methodological mismatches. This approach disentangles complex variable relationships and enables counterfactual reasoning. Using post-occupancy evaluation (POE) data from the Center for the Built Environment's (CBE) Occupant Survey as a demonstration case, we show how causal discovery reveals intervention hierarchies and directional relationships that traditional associational analysis misses. The systematic distinction between causally associated and independent variables, combined with intervention prioritization capabilities, offers broad applicability to complex human-centric systems, for example, in building science or ergonomics, where understanding intervention effects is critical for optimization and decision-making.


Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention

Xu, Zhe, Wang, Zhicai, Wu, Junkang, Lu, Jinda, Wang, Xiang

arXiv.org Artificial Intelligence

Large Vision-Language Models (L VLMs) often suffer from object hallucination, making erroneous judgments about the presence of objects in images. We propose this primarily stems from spurious correlations arising when models strongly associate highly co-occurring objects during training, leading to hallucinated objects influenced by visual context. Current benchmarks mainly focus on hallucination detection but lack a formal characterization and quantitative evaluation of spurious correlations in L VLMs. To address this, we introduce causal analysis into the object recognition scenario of L VLMs, establishing a Structural Causal Model (SCM). Utilizing the language of causality, we formally define spurious correlations arising from co-occurrence bias. To quantify the influence induced by these spurious correlations, we develop Causal-HalBench, a benchmark specifically constructed with counterfactual samples and integrated with comprehensive causal metrics designed to assess model robustness against spurious correlations. Concurrently, we propose an extensible pipeline for the construction of these counterfactual samples, leveraging the capabilities of proprietary L VLMs and Text-to-Image (T2I) models for their generation. Our evaluations on mainstream L VLMs using Causal-HalBench demonstrate these models exhibit susceptibility to spurious correlations, albeit to varying extents.


A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data

Kim, Dana, Xu, Yichen, Lin, Tiffany

arXiv.org Machine Learning

Large Language Models (LLMs) offer a flexible means to generate synthetic tabular data, yet existing approaches often fail to preserve key causal parameters such as the average treatment effect (ATE). In this technical exploration, we first demonstrate that state-of-the-art synthetic data generators, both GAN- and LLM-based, can achieve high predictive fidelity while substantially misestimating causal effects. To address this gap, we propose a hybrid generation framework that combines model-based covariate synthesis (monitored via distance-to-closest-record filtering) with separately learned propensity and outcome models, thereby ensuring that (W, A, Y) triplets retain their underlying causal structure. We further introduce a synthetic pairing strategy to mitigate positivity violations and a realistic evaluation protocol that leverages unlimited synthetic samples to benchmark traditional estimators (IPTW, AIPW, substitution) under complex covariate distributions. This work lays the groundwork for LLM-powered data pipelines that support robust causal analysis. Our code is available at https://github.com/Xyc-arch/llm-synthetic-for-causal-inference.git.


Group Interventions on Deep Networks for Causal Discovery in Subsystems

Ahmad, Wasim, Denzler, Joachim, Shadaydeh, Maha

arXiv.org Artificial Intelligence

Causal discovery uncovers complex relationships between variables, enhancing predictions, decision-making, and insights into real-world systems, especially in nonlinear multivariate time series. However, most existing methods primarily focus on pairwise cause-effect relationships, overlooking interactions among groups of variables, i.e., subsystems and their collective causal influence. In this study, we introduce gCDMI, a novel multi-group causal discovery method that leverages group-level interventions on trained deep neural networks and employs model invariance testing to infer causal relationships. Our approach involves three key steps. First, we use deep learning to jointly model the structural relationships among groups of all time series. Second, we apply group-wise interventions to the trained model. Finally, we conduct model invariance testing to determine the presence of causal links among variable groups. We evaluate our method on simulated datasets, demonstrating its superior performance in identifying group-level causal relationships compared to existing methods. Additionally, we validate our approach on real-world datasets, including brain networks and climate ecosystems. Our results highlight that applying group-level interventions to deep learning models, combined with invariance testing, can effectively reveal complex causal structures, offering valuable insights for domains such as neuroscience and climate science.


Combining SHAP and Causal Analysis for Interpretable Fault Detection in Industrial Processes

Santos, Pedro Cortes dos, Rocha, Matheus Becali, Krohling, Renato A

arXiv.org Artificial Intelligence

Industrial processes generate complex data that challenge fault detection systems, often yielding opaque or underwhelming results despite advanced machine learning techniques. This study tackles such difficulties using the Tennessee Eastman Process, a well-established benchmark known for its intricate dynamics, to develop an innovative fault detection framework. Initial attempts with standard models revealed limitations in both performance and interpretability, prompting a shift toward a more tractable approach. By employing SHAP (SHapley Additive exPlanations), we transform the problem into a more manageable and transparent form, pinpointing the most critical process features driving fault predictions. This reduction in complexity unlocks the ability to apply causal analysis through Directed Acyclic Graphs, generated by multiple algorithms, to uncover the underlying mechanisms of fault propagation. The resulting causal structures align strikingly with SHAP findings, consistently highlighting key process elements-like cooling and separation systems-as pivotal to fault development. Together, these methods not only enhance detection accuracy but also provide operators with clear, actionable insights into fault origins, a synergy that, to our knowledge, has not been previously explored in this context. This dual approach bridges predictive power with causal understanding, offering a robust tool for monitoring complex manufacturing environments and paving the way for smarter, more interpretable fault detection in industrial systems.